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Abstract We investigate Turing's notion of an A-type 
artificial neural network. We study a refinement of Tur- 
ing's original idea, motivated by work of Teuscher, Bull, 
Preen and Copeland. Our A-types can process binary 
data by accepting and outputting sequences of binary 
vectors; hence we can associate a function to an A-type, 
and we say the A-type represents the function. There 
are two modes of data processing: clamped and sequen- 
tial. We describe an evolutionary algorithm, involving 
graph-theoretic manipulations of A-types, which searches 
for A-types representing a given function. The algo- 
rithm uses both mutation and crossover operators. We 
implemented the algorithm and applied it to three bench- 
mark tasks. We found that the algorithm performed 
much better than a random search. For two out of 
the three tasks, the algorithm with crossover performed 
better than a mutation-only version. 
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1 Introduction 

In this paper we report on our investigations into one 
of Alan Turing's contributions to artificial intelligence. 
In 1948 Turing introduced a type of artificial neural 
network (ANN), which he called an A-type unorgan- 
ised machine. Motivated by his work and by work of 
Teuscher, Bull, Preen and Copeland (see Section [2]), we 
study a refinement of Turing's notion, which we call an 
A-type. 

A-types can be used to process binary data: with 
suitable conventions involving input and output nodes, 
one can input a string of binary vectors into an A-type 
and receive a string of binary vectors as output. Hence 
we can associate a function to an A-type; we say that 
the A-type represents this function. We devised an evo- 
lutionary algorithm (EA) to design an A-type that rep- 
resents a given function /. We use a graph-based rep- 
resentation for A-types, and our EA — and in particu- 
lar, our crossover operator — is based on graph-theoretic 
ideas. We implemented our algorithm and applied it to 
three benchmark problems. 

Turing's research on A-types is of great historical 
interest. As the centenary of his birth approaches, it is 
fitting to apply modern ideas — such as the theory of 
non-linear dynamical systems — to his ground-breaking 
work. A-types are an excellent test-bed for these ideas: 
they are composed of neurons with a very simple firing 
rule and they are easy to program, but they are also 
powerful. In this paper we adapt some existing ideas 
such as graph-based chromosomes and sequential in- 
put to the setting of A-types. The use of sequential 
input mode here brings up some new problems which 
motivated us to introduce a new kind of neuron, de- 
lay nodes, not originally envisaged by Turing (see Sec- 
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tion 13. 5p . Our graph-based EA works in the settings of 
both sequential and clamped input. 

In Section [2] we give a brief survey of previous work 
on A-types. In Section[3jwe present our interpretation of 
A-types, and we describe our EA in SectionUJ Section[5] 
contains the results of our experimental work. 

Our investigations are mainly at the proof-of-concept 
level. Our EA has many parameters and we chose their 
values in an ad-hoc fashion to ensure that solutions 
were quickly discovered reasonably often; we did not 
search systematically for the optimum values (but see 
Sections 15.61 and 16. ip . 

2 Historical Background and Previous Work 

In 1948 Turing wrote the pioneering technical report 
Intelligent Machinery [31] . In this report he introduced 
a type of ANN which he called an A-type unorganised 
machin^E This ANN is discrete, synchronously updated 
and, in general, recurrent. It is composed of basic and 
identical neurons (or nodes) each of which performs the 
Boolean operation NAND. The neurons are connected 
by arrows. For any Boolean function /, there exists a 
feed-forward A-type unorganised machine A that 'rep- 
resents' /. That is, there is always an A-type that given 
an input vector of Boolean values x will output the vec- 
tor of Boolean values f(x) (see Section l3.3.1D . Through- 
out this paper we use the term 'A-types' to refer to 
Turing's A-type unorganised machines. We also apply 
this term when we discuss our interpretation of Tur- 
ing's A-type unorganised machines and those of other 
researchers; we hope that the meaning is clear from the 
context. 

In |31) Turing introduced three models of compu- 
tation: A-types, B-type unorganised machines, and P- 
type unorganised machines. In our research we only use 
A-types. However, we mention these other two models 
to explain their relevance to our research. 

The second ANN that Turing introduced was a spe- 
cial kind of A-type, which he called a B-type unorga- 
nized machine. These networks are effectively A-types 
the arrows of which can be switched on and off by 
changing the state of particular nodes in the network. 
Turing constructed these switchable arrows with a par- 
ticular configuration of nodes and arrows. In the late 
1940's A-types would have had to have been directly 
implemented in hardware; Turing's B-type unorganised 
machines offer a means of effectively reconfiguring the 
topology of a network without reconfiguring hardware. 

1 This was seemingly independent 7, p408] of the 1943 pa- 
per |17| of McCulloch and Pitts in which ANNs were first 
introduced. 



Today, ANNs are often implemented in software that is 
several levels of abstraction above computer hardware; 
however, there may be novel architectures for which 
the reconfigurable architecture B-type unorganized ma- 
chine is useful. 

In [31] Turing introduced P-type unorganised ma- 
chines. Unlike a B-type, a P-type is not a special case 
of an A-type (nor is a P-type a generalisation of an A- 
type). Turing used P-types to investigate learning. This 
pioneering work would now be classed as an investiga- 
tion into reinforcement learning. For further details see 
Copeland [5J. 

Artificial neural networks have found wide applica- 
tion and are an active area of research, yet only a few 
researchers have continued Turing's work on A-types. In 
1996 Copeland and Proudfoot [BJ re-examined this re- 
search. The most notable continuation of research into 
Turing's networks was conducted in 2001 by Teuscher [2"D] 
Teuscher experimented with A-types with fixed input 
states; for instance, he used A-types in this manner to 
solve basic pattern classification tasks and he showed 
that their dynamics are analogous to those of a non- 
linear oscillator [30]. Teuscher employed EAs to train 
Turing's networks: he used linear data structures (linear 
chromosomes) to represent B-types. Teuscher used B- 
types with lists that prescribed whether each arrow in a 
B-type was in a 'connected' or 'disconnected' state [3D] 
p88]. These lists give linear chromosomes for Teuscher's 
B-types. 

Today, Turing's A-types can be considered a spe- 
cial class of Random Boolean Networks [29] p25] . Ran- 
dom Boolean Networks are simple discrete dynamical 
systems that are capable of complex behaviour; conse- 
quently, they are useful for modelling complex systems 
such as gene regulation mechanisms in biology and the 
internet [12], [21]. Teuscher investigated the non- linear 
dynamics of A-types [3DJ ch 5], [3D]. Recently, Bull [3J, 
and Bull and Preene [I] investigated the evolution of A- 
type machines, and they considered this in the context 
of discrete dynamical systems. 



3 Our Interpretation of Turing's A-types 

In this section we present our definition of an A-type. 
This is an interpretation of Turing's A-type unorgan- 
ised machines, and has been influenced by Teuscher's 
research [5DJ. We also provide illustrations of our A- 
types, and we compare our definition with those of Tur- 
ing and Teuscher. 
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3.1 Our Definition 

An A-type is a discrete, recurrent, synchronously up- 
dated ANN. The firing rule for every neuron in an A- 
type is invoked simultaneously — we can imagine that 
all neurons are updated via the same clock. Each of 
the instants at which the neurons in an A-type are syn- 
chronously updated is called a moment. 

In order to define A-types, we need the notion of an 
A-type graph. An A-type graph is a directed graprjl with 
the following properties. Every node has an indegree no 
greater than two. An A-type graph has a non-empty set 
of nodes called input nodes, each of which has indegree 
zero. An A-type graph has a non-empty set of nodes 
called output nodes, each of which has outdegree zero. 
The set of input nodes and set of output nodes do not 
intersect. Arrows from an input node to an output node 
are not permitted. Nodes that are not output nodes 
have no restriction on their outdegree. 

An A-type consists of an A-type graph and a non- 
negative integer 5, called the delay time. We interpret 
the nodes of the graph as the neurons of an ANN, and 
the arrows of the graph as the interconnections. The 
delay time determines the number of moments from 
when information first enters the input nodes to when 
we start to collect information from the output nodes 
(we elaborate on this in Section l3.3.ip . We call the num- 
ber of input (output) nodes of an A-type its input (out- 
put) dimension. Because A-types are recurrent, A-type 
graphs can have closed paths. 

An A-type is a Boolean ANN. Consider an A-type 
A. Each interconnection of A carries exactly one bit 
of information per moment. That is, we associate a 
Boolean variable with every arrow in the A-type graph 
of A. Every node in A has a firing rule that is a Boolean 
function (of the variables entering that node). Further- 
more, every node in an A-type has a Boolean variable 
associated with it. We call this variable the state of that 
node. In general, the state of a node varies from mo- 
ment to moment. At any moment the output of a node 
is equal to the state of the node. 

We classify every node that is not an input node 
into one of two types depending on its firing rule: nand 
nodes and delay nodes. A nand node q has an indegree 
of two and its firing rule is NAND. That is, let a and b 
denote the Boolean values associated with the respec- 
tive arrows entering q at moment t. At moment (t + 1) 
the state of q is a NAND b. A delay node d has an inde- 
gree of one and its firing rule is the identity. That is, let 

2 When we talk of a directed graph we allow multiple arrows 
in one or both directions between a given pair of nodes and we 
allow loops (arrows with the same source and target nodes). 
Some authors call this a directed multigraph. 



a denote the Boolean value associated with the arrow 
entering d at moment t. At moment (t + 1) the state of 
d is a. A nand node can accept two inputs from a single 
nand or delay node. Note that we initialize the state of 
every non-input node to zero. We explain the rules for 
initialising and updating input nodes in Section |3"31 

3.2 Illustrations 

In graph theory diagrams are employed to represent a 
graph. We use similar diagrams for our A-type graphs. 
Input nodes are represented by circles with no incoming 
arrows. Nand nodes are represented by circles that have 
two incoming arrows. Delay nodes are represented by 
triangles that have one incoming arrow. Output nodes 
are denoted by doubled circles or doubled triangles. We 
illustrate these conventions in Figure [TJ In Figure [2] we 
depict a simple A-type. 




Vr Vr 

(a) An input node. (b) A nand node. 




(c) A delay node. (d) Output nodes. 



Fig. 1 Illustrating the types of nodes in an A-type. Note 
that for any particular moment the Boolean values j/i , . . . , y r 
associated to the arrows exiting a given node are all identical. 




Fig. 2 An A-type graph. 



3.3 Processing Information 

In this section we describe how we employ A-types to 
process information. By a Boolean vector we mean a 
vector the components of which are all either or 1. 
We denote by S m the set of all m-component Boolean 
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vectors. We now explain how A- types can accept and 
output sequences of Boolean vectors. 

3.3.1 Input and Output 

Consider an A-type A that has input dimension n. To 
enable us to input information into A we adopt the fol- 
lowing update rule for the input nodes of A. Choose an 
ordering on the set of input nodes. Suppose we are given 
a sequence of n-component Boolean vectors (xq , . . . , x q ) . 
For the first q moments, the states of the input nodes 
of A at moment t are given by the components of Xt in 
the appropriate order. In particular, the initial states of 
the input nodes arc determined by xq. We say that at 
moment t, xt is input into A. We adopt the convention 
that after the input vectors are used up the states of the 
input nodes remain constant, keeping the values from 
the final input vector. That is, for moments t > q the 
states of the input nodes are given by the components 
of x q . 

Consider an A-type A that has delay S and output 
dimension p. We collect output information from the 
output nodes of A, starting not at moment but at 
moment 5; the idea is that it may take data some time 
to percolate through the A-type and reach the output 
nodes. At each moment t > 6 the states of the output 
nodes of A generate a p-component Boolean vector y t . 
We say that at that moment A outputs yt ■ 

A-types can be viewed as non-linear dynamical sys- 
tems [221 pl32]. Because our A-types accept sequential 
data, they are analogous to non-linear oscillators that 
are subjected to a driving force that is generally not 
constant. Note that when we use an A-type to process 
binary data, the delay S is a parameter which is inde- 
pendent of the input data. If an A-type is to represent 
a sequential function in the sense of Section 13.41 below, 
it must have the special property that the time for the 
input data to travel to the output (s) should not depend 
on the choice of input data. 

3.3.2 Clamped Input Mode 

Consider the special case when a single Boolean vector 
xq is input into an A-type A. The states of the input 
nodes of A stay constant, with values determined by 
xq. In this case we say that the input nodes of A are 
clamped by xq, and we say that we are operating A in 
clamped mode. 

Consider an A-type A, with delay i5. Let A be clamped 
by some input vector xq. If the states of the output 
nodes of A are constant for all moments t > S then we 
say A is clampable with respect to the input Xq. We say 
A is clampable if it is clampable for every xq. 



We can operate an A-type in clamped input mode 
even when it is not clampable. Because the graph of 
an A-type is finite, if an A-type is operated in clamped 
mode then eventually the output becomes periodic. For 
a clampable A-type, this period is always one. 

We now present an example of a clampable A-type. 
Consider the A-type A A , with a delay of 6 — 2, shown 
in Figure [3l It is easy to check that A A is clampable and 
that for every input [a, &H 6 S-2, the eventual output is 
a A b. 




Fig. 3 A clampable A-type with delay 5 = 2. 



A Boolean function is a function from S n to S p 
for some positive integers n and p. Consider a Boolean 
function / and a clampable A-type A. We say that A 
represents the Boolean function / if the following holds: 
for each x G S n , if A is clamped with respect to the in- 
put x then A outputs the constant sequence of vectors 
f(x). For example, Logical AND A maps S2 to Si and 
it is clear from the discussion in the previous paragraph 
that the A-type A A shown in Figure |3] represents A. It 
can be shown that for any Boolean function /, there 
exists a feed-forward A-type without delay nodes that 
represents /; Figure ! 



illustrates how to construct an 
A-type to represent a Boolean function which is built 
from V and A. In Section 13.41 we generalize our defi- 
nition of what it means for an A-type to represent a 
function. 

3.3.3 Sequential Input Mode 



In Section [33T] we defined a way of inputting informa- 
tion into an A-type so that A-types can accept and re- 
turn sequences of Boolean vectors. We considered con- 
stant input and output sequences in Section 13.3.21 In 
general, the sequences that A-types accept need not be 
constant. 

Consider an A-type A with delay 5, input dimension 
n and output dimension p. Recall that at every moment 
t, A accepts an n-component Boolean vector xt, and for 
each moment t > 5, A returns a p-component Boolean 
vector y t . We say that we are operating A in sequential 
mode. 

3 For typographical reasons, we write a Boolean vector in 
row vector form in the text and in column vector form in our 
figures. 
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In Figure H] we illustrate a simple A- type A with de- 
lay 5 = 2, and an input sequence of 5 Boolean vectors. 
The A-type A returns an output sequence consisting 
of 3 Boolean vectors. In Figure [5] we illustrate how A 
changes over these moments. Each sub-figure is a snap- 
shot of the entire A-type at a particular moment. We 
give the input Boolean vector, the states of the nodes 
of A and the output vector at that moment. In Fig- 
ure |5] we illustrate A, the input sequence for the first 
five moments, and the output sequence for the first five 
momentf0. 




(a) t = 




(b) t = 1 













[o] 


• [o] 


< [i] 


■[!] 





i = 4 t = 3 4 = 2 i=l 4 = (J *\_X * = 4 4 = 3 4 = 2 

Fig. 4 A sequence of five input vectors, and an A-type with 
delay 5 = 2. Three output vectors are expected in response 
to these input vectors. 

Note that (by our convention introduced in Sec- 
tion 13.3. ljl if a sequence of I Boolean vectors is in- 
put into an A-type then for every moment t > / the 
states of the input nodes of that A-type are constant. 
This convention serves to 'shunt' information through 
an A-type. For instance, in Figure 2] the initial input 
sequence and the desired output sequence have length 
3, but 5 moments are needed to collect the output be- 
cause 5 = 2. Our shunting convention ensures that the 
input states are well-defined for the final two moments. 




(e) t = 4 

Fig. 5 Snapshots of the A-type shown in Figure [5] over the 
first five moments. The numbers written inside the nodes are 
the states of the nodes at the given moment. 



3.4 Representing Sequential Functions 

Here we explain how to associate a function to an A- 
type. In Section 13.3.21 we defined what it means for a 
clampable A-type operating in clamped input mode to 
represent a Boolean function. Here we generalize this 
notion. 

Let S m ,i denote the set of all sequences of length I 
consisting of m-componcnt Boolean vectors. Note that 
S m ,i — S m . Consider a function / from S n ,k to S Pt i, 
for some positive integers k, I, n, and p. We call / a 
sequential Boolean function. A Boolean function / is the 
special case of a sequential Boolean function with k = 
1 = 1. We say that an A-type A represents f if for every 
x G S n< k, when A accepts x it outputs the sequence 
f(x). So if A represents / then the input dimension of 
A must be n and the output dimension of A must be p. 

4 Note that the rightmost entry of an input sequence is the 
input vector for the earliest moment. 













[o] 


[o] 


•[;] 







4 = 4 t = 3 t = 2 t = 1 4 = *\_J 1 = 4 ' = 3 1 = 2 

Fig. 6 From Figure [5] we can determine the response of the 
A-type shown in Figure 3] Here we show the full sequence of 
input and output vectors for this A-type. 

For example, consider serial addition. We can de- 
scribe this in terms of a sequential function / + which 
maps S2.1 to Si ,(z+i), for some positive integer I. Given 
an input sequence x, the first entries of the vectors in 
x give the binary encoding for some integer a, the sec- 
ond entries of the vectors in x give the binary encoding 
for some integer 6, and the entries of the vectors in 
/(x) give the binary encoding for the integer (a + b). 
In Section [3.4. II we describe another class of sequential 
functions, the columnwise Boolean functions. 
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Let us touch upon the possible functions that our 
A-types can represent when operated in sequential in- 
put mode. Recall from Section [3.3.21 that any clamped 
Boolean function can be represented by a clampable 
feed-forward A-type. We can regard representing func- 
tions in clamped mode as a special case of represent- 
ing functions in sequential mode: the input sequences 
have length 1 and the output sequences are required to 
be constant. Because of this, sequential tasks are gen- 
erally more difficult than clamped tasks. In principle, 
one can devise an A-type that represents binary addi- 
tion of strings s± and S2 or arbitrary length; however, in 
practice this is not trivia^. It is impossible [HI p27] to 
devise an A-type that represents binary multiplication 
of strings s\ and S2 of arbitrary length. 

3.4-.1 Columnwise Boolean Functions 

We define a columnwise Boolean junction as follows. 
Let n,p be positive integers and suppose we are given 
a positive integer k. For any Boolean function / we de- 
fine columnwise / to be the function that maps S n k 
to Sp^k for any positive integer fc, such that if Xi de- 
notes the ith term of an input sequence and yi denotes 
the ith term of the corresponding output sequence then 
Hi = f(xi). This says that bits of the input in different 
columns do not interact with each other. Conversely, 
if g is a columnwise Boolean function then we call the 
underlying Boolean function clamped g. 

For example, let us consider columnwise Exclusive 
OR. The Boolean function Exclusive OR © maps S2 to 
Si. Columnwise Exclusive OR maps a sequence x of k 
2-component Boolean vectors to a sequence y of k 1- 
component Boolean vectors, such that the «th term of 
y is ®{xi), where Xi denotes the ith term of x. It is easy 
to check that the A-type shown in Figure [7] represents 
columnwise Exclusive OR. 

One can show that A-type representing a column- 
wise Boolean function / also represents clamped /, but 
the converse is false in general for the reasons discussed 
at the start of Section 13.51 

3.5 The Necessity of Delay Nodes 

Operating A-types in sequential mode brings some new 
challenges. Data travels through the A-type from in- 
put nodes to output nodes along various paths. If these 
paths have different lengths then the arrival times are 

For example, Minsky 1 181 p27] describes a McCulloch- 
Pitts network that performs serial addition. In principle, these 
details could be used to construct an A-type that represents 
binary addition of Si and s^. 




Fig. 7 An A-type with delay time 5 = 3 that represents 
columnwise Exclusive OR. 

not synchronised. In order to represent sequential func- 
tions, it is useful — and, we believe, sometimes necessary — 
to have a way to stagger the data. This is why we in- 
troduced delay nodes, which do not appear in Turing's 
original notion of an A-type. 

We collected evidence that delay nodes are neces- 
sary for A-types to perform certain sequential tasks. In 
particular, via computer simulations we collected evi- 
dence that supports the following claim. 

Claim: There does not exist an A-type without delay 
nodes that represents columnwise Exclusive-OR. 

We employed a blind search for A-types representing 
columnwise Exclusive OR. We repeatedly constructed 
a random A-type with delay nodes and tested whether 
it represented columnwise Exclusive-OR. Similarly, we 
repeatedly constructed a random A-type without delay 
nodes and tested whether it represented columnwise 
Exclusive-OR. The size of each A-type was randomly 
chosen from the interval [8,40]. A sequence of 10 4 ran- 
domly generated 2x1 input vectors was used to test 
whether an A-type represented columnwise Exclusive- 
OR: if an A-type represented columnwise Exclusive-OR 
for such an input sequence then it was deemed to do so 
for all input sequences. The results of these searches are 
presented in Table [TJ In summary, we discovered many 
A-types with delay nodes that represented columnwise 
Exclusive-OR; however, we failed to find a single A- 
type without delay nodes that represented columnwise 
Exclusive-OR. 

It is often the case that an A-type representing a 
clamped Boolean function can be modified to repre- 
sent the corresponding columnwise Boolean function by 
adding some delay nodes. The delay nodes are used to 
stagger data flowing through parts of an A-type and 
ensure the overall flow is synchronised. For example, 
consider again the Boolean function Exclusive-OR. We 
can write A © B as (A V B) A (A A B). From this ex- 
pression we devise a way to construct an A-type Aq 
that represents clamped Exclusive-OR, using A-types 
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Table 1 The results of our blind searches for A-types that 
represent columnwise Exclusive-OR. 



With delay nodes 


number of attempts 


1.6 x 10 10 


probability that a node is constructed as a de- 


20% 


lay node 




number of solutions 


1342 


Without delay nodes 


number of attempts 


1.6 x 10 10 


probability that a node is constructed as a de- 


0% 


lay node 




number of solutions 






that represent columnwise Inclusive-OR and column- 



wise AND; we illustrate this in Figure 8(a) Next we 
construct an A-type A\ by inserting a delay node into 
Aq; we illustrate this in Figure |8(b)| This delay node 
ensures that data is synchronised as it flows through A\ ; 
consequently, Aq represents columnwise Exclusivc-OR. 

Can one mimic the effect of the delay node using 
only nand nodes? We can formulate this question in 
terms of A-types that represent the identity. Suppose 
there exists an A-type I m with a delay 5 = m, where m 
is an odd positive integer, such that I m contains no de- 
lay nodes. It is straightforward to find an A-type I m -\ 
with even delay m— 1 representing the identity function 
such that I m -i contains no delay nodes: Figure 12(a) 
gives an A-type that works for the special case of delay 
2, and we can obtain any even delay by concatenating 
copies of this A-type. Let us construct an A-type A-i as 
follows: we insert I m between nodes 4 and 6, and we 
insert l m -\ with between nodes 5 and 6 (if m — 1 = 
then we just put a single arrow directly from node 5 
to node 6). See Figure [8(b)] This ensures that the two 
inputs into node 6 are synchronized. 

It is clear from the above discussion that if there 
exists an A-type I m as above then we can mimic the 
effect of a delay node using only nand nodes. The con- 
verse is also true, since a delay node represents the iden- 
tity function with delay 1. This motivates the following 
claim. 

Claim: There does not exist an A-type without delay 
nodes and with an odd delay that represents column- 
wise identity. 

The construction illustrated in Figure 8(c) shows that 
if this claim is false then the previous claim is also false. 

We employed a blind search for a counter-example 
to the claim. We repeatedly constructed a random A- 
type without delay nodes and tested whether it repre- 
sented columnwise identity. The size of each A-type was 
randomly chosen from the interval [3,20]. For each A- 
type a sequence of 10 4 randomly generated lxl input 



A (0 



B (l 




(A VB)A(iAB) 



(a) Composing an A-type (<5 = 4) to represent 
A © B = (AVB)A(AaB). Note that the sub- 
graph generated by the node set {0,1,2,3,5} rep- 
resents (AVB). Also, the subgraph generated by 
the node set {0, 1, 4} represents AaB. Furthermore, 
the subgraph generated by the node set {4, 5, 6, 7} 
represents AND. 




(.4 VB)A(iAB) 



(b) Inserting a delay machine into the A-type (5 = 
4) shown in Figure |8(a)| This ensures that the two 
inputs into node 7 are synchronized. 




(.4 VB)A(AaB) 
7 s ) 



8(b) 



87a7 



(c) Generalising the solution shown in Figure 
We insert two A-types into the A-type shown in 
First, we insert an A-type J m with a delay <5 
where m is an odd positive integer. Second, we insert 
an A-type I m -\ with a delay m — 1. This ensures 
that the two inputs into node 6 are synchronized. 
Consequently, if we can discover an A-type without 
delay nodes that has an odd delay then we can con- 
struct an A-type without delay nodes that represents 
columnwise Exclusive-OR. 



Fig. 8 Using an expression that involves AND A, NAND A, 
and Inclusive-OR V to generate an A-type that represents 
Exclusive-OR ©. 
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Table 2 The results of our blind searches for columnwise 
identity. 



Solution A-type 


frequency 


number of attempts 


34000000000 


number of solutions with even delay 


971789859 


number of solutions with odd delay 






vectors was used to test whether the A-type represented 
columnwise identity: if an A-type represented column- 
wise identity for such a sequence then it was deemed 
to do so for all input sequences. The results of these 
searches are presented in Table [2j In summary, we dis- 
covered many A-types with even delay that represented 
columnwise identity; however, we failed to find a single 
A-type with an odd delay that represented columnwise 
identity. 

From the above results, we conjecture that A-types 
with delay nodes operated in sequential mode can rep- 
resent a more general class of function than A-types 
without delay nodes. We found experimental evidence 
that supports our claims, but we were not able to dis- 
cover a formal mathematical proof. We leave this as an 
open problem. 

It is clear from the above discussions that we can 
implement a delay of any length by concatenating the 
following: (a) a single delay node, and (b) an A-type 
with even delay and without delay nodes that repre- 
sents the identity. Hence only a small number of delay 
nodes is needed in any given A-type. 



3.6 Comparison with Previous Definitions 

Our definition of an A-type, given above, differs from 
those of Turing and Teuscher. The differences are in our 
allocation of input and output nodes, and our introduc- 
tion of delay nodes. 

Turing did not precisely prescribe how information 
could be input and output for A-types. To address this 
issue Teuscher [22J p32] defined A-types with input and 
output nodes. Essentially, we have adopted Teuscher's 
conventions for input and output nodes. 

We introduce delay nodes so to allow our A-types to 
process sequential input. Neither Turing nor Teuscher 
make use of delay nodes. However, Teuscher [23, p67] 
investigates sequential tasks by in effect employing two 
clock speeds: one for the rate of information input and 
output, and one for the rate of information flow between 
neurons. We chose to introduce delay machines because 
they allow a straightforward way to interpret Turing's 
A-types so that they can operate with a sequential in- 
put. Furthermore, the training algorithms that we em- 



ploy for our A-types are useful in both the clamped and 
the sequential settings — see Section [51 

4 Learning via Evolution 

We now turn to our central problem: how to find an 
A-type that represents a given function /. We imple- 
mented a reinforcement learning technique involving an 
EA which searches for 'suitably small' A-types that rep- 
resent /. 

In his pioneering paper |31| Turing examines rein- 
forcement learning. For instance, he defines a P-type 
machine to elaborate on some of his ideas. Furthermore, 
Turing briefly mentions a 'genetical search', but does 
not provide details of such a training method. One pop- 
ular modern reinforcement learning technique is EAs, 
and now their use to train ANNs is established [TU] . 
Teuscher used EAs to train B-types [29]. We also use 
EAs to train A-types. 

In this section we outline a simple EA that we em- 
ploy, and we present our mutation and crossover oper- 
ators. In particular, we describe our efforts to devise 
useful crossover operators (see Section 14.41) ; further de- 
tails can be found in [20]. In Section [5] we explain how 
we tested our EA and we present the results of these 
tests. Our EA works for A-types in both clamped and 
sequential modes. 

When we implement our EA, we need to assign val- 
ues to various parameters. Some of these values are 
task-dependent. We give the parameter values in Sec- 
tion [S] 

4.1 Introducing our EA 

In Table [3] we give an outline of our EA. We call this 
EA geneticsearch-one. This EA is a straightforward 
implementation; for example, it is similar to the scheme 
outlines in [9j ch 2], and the scheme outlined in [T9j ch 
9]. Note that genetics earch-one is a steady state EA 
in that its population has only a small variation from 
generation to generation. In later sections we require 
the listing in Table [3] to describe two special cases of 
geneticsearch-one. 

4.2 Chromosomes 

Our candidate solutions are graph-like; this is made ex- 
plicit by our use of an A-type graph to define an A-type. 
An A-type graph can be represented by an adjacency 
list. Teuscher 29, p88] demonstrated that A-types (and 
B-types) can be assigned linear chromosomes. If an EA 
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Table 3 An outline of the EA, genetic_search_one, that we 
use in this paper. 

genetic_search_one 

1. Create initial population: Randomly generate a specified 
number of candidate solutions of size within a specified 
range. 

2. Iterate through successive generations: Repeat until ei- 
ther the population contains a solution or a maximum 
number of attempts have been performed. 

(a) perform a set number of crossovers: Repeat a set 
number of times. 

i. parent selection: Select a pair of candidate solu- 
tions as parents. The fitter a candidate solution, 
the greater the probability that it is chosen as a 
parent. 

ii. crossover: For each parent pair combine informa- 
tion from both parents to produce a new candi- 
date solution, which is added to the population. 

iii. survivor selection: Select a member of the pop- 
ulation and delete it from the population. 

(b) perform a set number of mutations: Repeat a set 
number of times. 

i. mutation: Randomly select a member of the pop- 
ulation, copy it, slightly modify the copy, and add 
the modified copy to the population. 

ii. survivor selection Select a member of the popu- 
lation and delete it from the population. 

3. Return the fittest candidate solution in the population. If 
there is more than one candidate solution with the low- 
est fitness of the population then we randomly select an 
element from the set of such individuals. 



employs linear chromosomes then it is easy to imple- 
ment simple crossover and mutation operators; for ex- 
ample, bit-flipping mutation and one-point crossover 
ch 3]. 

We choose to represent A-types with graph chromo- 
somes because it allows a straightforward implementa- 
tion of some graph-theoretic manipulations on A-types. 
In particular, adding and removing topologically con- 
nected subgraphs from the graph of an A-type becomes 
straightforward. We encode an A-type graph as an ob- 
ject which has a collection of node objects associated 
to it; each node object can reference other node ob- 
jects. This approach has two advantages: it captures 
the topology of an A-type graph, and it does not im- 
pose an artificial ordering on the nodes. 

4.3 The Initial Population 

Our EA requires an initial population of A-types to 
be created. To do this, random A-types are generated 
with size between a specified upper bound and a spec- 
ified lower bound. (We define the size of an A-type A 
to be the number of nodes it contains, and we denote 
this by \A\.) The mutation and crossover operators can 
change the size of A-types, so subsequent populations 



can contain A-types whose size is outside the original 
bounds. 

4.4 Evolutionary Operators 

Our EA involves mutation and crossover operators. Here 
we describe our implementation of these operators. 

4-4-1 Mutation 

Our mutation operator manipulates an A-type graph. 
The search space of our EA contains A-types that have 
a range of sizes. Consequently, we construct a muta- 
tion operator that can alter the size of an A-type. More 
precisely, our mutation operators accept an A-type Ai n 
and return an A-type A out such that either |A out | = 
\A m \ - 1, or \A out \ = \Ainl, or \A out \ = \A m \ + 1. We 
achieve this by copying the input A-type: A out Ai n , 
and performing one of the three following operations. 
One, a node n is removed (if possible) from A out and 
there is a slight re-arrangement of the graph of A out in 
order to make A out into a valid A-type. Two, a single 
arrow is removed from A out and a new arrow inserted in 
order to make A out into a valid A-type. Three, a node 
n is added to A out and arrows are added, and there is 
a slight re-arrangement of the graph of A out in order to 
ensure that n has an output arrow and A out is a valid 
A-type. We illustrate these operations in Figure |H1 

4-4-2 Crossover 

Our crossover operator involves operations that respect 
the topology of the graphs of the parent A-types. The 
operator exchanges subgraphs of the parents. Only topo- 
logically connected chunks of the parent graphs are 
exchanged, and reconnection of exchanged chunks in- 
volves only the insertion of arrows that bridge the 'bound- 
aries' of these chunks. We make this more precise in 
the following subsections. In this section we present a 
crossover scheme which employs these ideas. In Sec- 
tion [5] we describe our tests of this crossover operator. 

Our crossover operator accepts two parent A-types 
9, Cf and returns one child A-type C. Two aspects of 
this crossover operator require further explanation: the 
acceptor and donor subgraphs are graphs of a particular 
type, and there are restrictions on the arrows that may 
be inserted to reconnect the child C . We elaborate on 
these two aspects next. 

The donor and acceptor are subgraphs of a partic- 
ular kind, which we call radial subgraphs. Consider a 
graph G and a node c C G, which we call the cen- 
tre (c is chosen randomly in the crossover operator). If 
possible, we construct a radial subgraph of G with N 
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(b) A smaller mutant. 




(c) A mutant of the same size. 




(d) A larger mutant. 
Fig. 9 Three examples of mutation. 



nodes about the centre c as follows. Construct a set S 
which initially contains only c. Let S denote the set of 
all nodes that are adjacenl|j to nodes in S but are not 
already in S. We randomly select elements of S and 
transfer them to S until |5| — N or \S\ — (where |5| 
denotes the size of S). We repeat the above process of 
constructing a set of nodes that are adjacent to S and 
selecting from that set until \S\ = N or \S\ =0. At any 



6 Two nodes are adjacent if they are the endpoints of a 
particular arrow. That is, two nodes are adjacent if there is 
an arrow connecting them. 



Table 4 Our A- type crossover operator. 

Crossover 

1. The child C is assigned simply to be a copy of the parent 
9; that is, C «- ?. 

2. A subgraph of C is chosen; we call this the acceptor A. 

3. A subgraph of O" is chosen; we call this the donor D. 

4. The subgraph A is removed from C (any arrows bridging 
(C— A ) and A are also removed) and a copy of D is inserted 
into C. 

5. Arrows are added to C so that C is a valid A-type: 

(a) Inserting arrows from C— A to D. For each arrow the 
source is randomly selected from the distal boundary 
of A. 

(b) Inserting arrows from D to C— A. For each arrow the 
source is randomly selected from the proximal bound- 
ary of D. 



point if \S | = N then we use S to generate a subgraph 
from G. This subgraph is the desired radial subgraph. 

For each of the acceptor and donor sets, the size 
N is a randomly chosen integer between 1 and a hxed 
proportion of the total size of the parent graph. The 
crossover algorithm always exchanges 'localized' and 
connected subgraphs of the graphs of the parents. 

When our crossover reconnects subgraphs in the 
graph of the child, arrows may only be inserted between 
boundaries of the acceptor and donor subgraphs. To 
explain this process we define two types of boundaries: 
proximal boundaries and distal boundaries. Consider a 
graph G with a subgraph S. Also, let G — S denote the 
complement of S. The proximal boundary of S is the set 
of nodes in S that are adjacent to nodes in G — S. The 
distal boundary of S is the set of nodes in G — S that are 
adjacent to nodes in S. For our crossover operator, the 
final step of constructing the child requires the inser- 
tion of arrows between the complement of the acceptor 
and the donor. Arrows are only inserted between nodes 
in the distal boundary of the acceptor and the proximal 
boundary of the donor. 

We give an outline of our crossover operator in Ta- 
ble SI We give a concrete example of our crossover oper- 
ator in Figure [TDJ As this example shows, the acceptor 
and donor subgraphs can have different sizes; also, the 
two parents and the child can all have different sizes. 

The use of graph chromosomes is well established [5J 
p265]. Of particular relevance to our work is research 
conducted by Poli [23], [2T]. He evolved computer pro- 
grams represented by graphs and he used the topology 
of his graphs to devise evolutionary operators. Poli uses 
planar graphs, whereas our A-type graphs need not be 
planar. Poli's crossover operators exchanged connected 
subgraphs of graphs of parents, as do our crossover op- 
erators, although we require that our subgraphs be a ra- 
dial set. To our knowledge these graph-theoretic ideas 
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(b) The father O* and its donor subgraph D = 
{1', 2', 3', 4', 5', 6'}. The proximal boundary = {1',5',6'j, and 
the distal boundary = {0', 7'}. 




(c) Inserting D into ? — A 




(e) Inserting arrows from D to 9— A. For each inserted arrow 
the source is randomly selected from the proximal boundary 
(shaded nodes). 

Fig. 10 A concrete example of our crossover operator. The 
numbers inside the nodes are labels for the nodes. 

have not previously been used to devise evolutionary 
operators for A-types. 



4.5 Fitness Function 

We use a standardized (and normalized) fitness func- 
tion. That is, our fitness function returns a real number 
between and 1, inclusive. The lower the fitness of an 
A-type, the more fit that A-type is. In this section we 
define our fitness function. 

Recall from the start of Section @] that we use our 
EA to search for 'suitably small' A-types that repre- 
sent a particular function /. We require training data 
T that is a set of input-output pairs of /. That is, 
T = {{xi, f(xi))} where i is an element of some in- 
dex set. We call each pair in T a training example. We 
also require an upper value u for A-type sizes: A-types 
larger than u are considered unsuitable solutions. We 
call u the penalty bound. Note that in our algorithms, 
we always take the value of u to be equal to the upper 
bound of the size of A-types in the initial population 
(see Section l4~o r )) . 

Consider a candidate solution A. We determine the 
fitness of A as follows. 

1. Determining the performance of A with respect to T . 
Let A{xi) denote the output of A given an input Xi. 
For each training example (xj, /(xj)) we calculate 
the normalized Hamming distance between A(xi) 
and f(xi). Let d denote the average of all of these 
Hamming distances. 

2. Including a penalty if A is larger than u. Choose a 
positive real number m, which we call the pressure 
gradient. If \A\ < u then A's fitness is d. Otherwise 
the fitness of A is minimum of {1, dm(|A| — u + 1)}. 

Thus our fitness function is a continuous piecewise func- 
tion g. It is initially constant with g = d, then linear 
with a gradient m, then constant with g = 1. This 
enables us to 'pressure' the population so that it is un- 
likely to contain A-types of size much greater than u. 



4.6 Selection Rules 

In our EA, for each generation we have three operations 
which require A-types to be selected from the popula- 
tion: crossover, mutation, and elimination. In this sec- 
tion we explain how we perform the selections. 

For crossover our parent selection is a fitness pro- 
portional selection. The fitter the A-type the greater 
the probability that that A-type is selected as a parent. 
A-types are chosen by their fitness weighted by a func- 
tion h; we chose h to be an exponential. The choice of 
h was the same for all the tasks we considered. 

For the elimination operation A-types are also cho- 
sen by their fitness weighted by an exponential. How- 
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ever, the less fit A-types are more likely to be chosen 
for elimination. 

For mutation our selection operator is random. 



4.7 Implementation 

4-7. 1 Candidate Solutions 

When our EA searches for an A-type A that represents 
a given concept function it has to search for both the 
graph of A and the delay time 8 of A. For each candidate 
solution the EA chooses an A-type graph, estimates a 
range of possible delays for that graph, and determines 
the fitness of each (graph + delay) A-typc. That is, each 
candidate solution consists of a set of A-types all with 
the same underlying A-type graph but with different 
delays coming from some interval. So in our algorithm 
descriptions when we say that we make an A-type we 
are actually making a set of A-types. We chose this 
implementation because it is easy to code and efficient 
to run. 



4-7.2 Estimating a Range of Delays 

When our EA constructs an A-type graph G (either a 
randomly constructed graph for the initial population, 
or the result of crossover or mutation) it must estimate 
a suitable range of delays for G. Let N denote the num- 
ber of nodes in G. Let A denote an A-type with graph 
G and a delay time 6 = 0. The larger the range of de- 
lays for each individual, the longer it takes to train each 
individual. We take a somewhat pragmatic approach to 
estimate the range of delays. To estimate the minimum 
delay we perform the following four steps. First, we in- 
put a random sequence of vectors into A. We collect 
the output vectors from A and call this sequence S out . 
Second, we repeat the above step, yielding a second out- 
put sequence S' out . Third, we determine the position q 
where S ou t and S' out first differ (if S ou t = S' out then we 
set q = —1). This gives a reasonable estimate of the 
minimum possible time for data to percolate through 
the network from the input nodes to the output nodes. 
Fourth, we subtract the sum of the input dimension of 
A and the output dimension of A from q. If q is negative 
then we set it to zero. Our estimate of the minimum de- 
lay is q. We take the maximum delay to be N; this gives 
a reasonable estimate of the maximum possible time it 
can take data to percolate through the network from 
the input nodes to the output nodes. 



5 Simulations 

To investigate the performance of our EA we imple- 
mented the algorithm using Java and ran many simu- 
lations with it. Here we describe our simulations and 
present our results. Further detail can be found in [5D] . 

5.1 Experimental Method 

Our simulations investigated the performance of our 
EA. We concentrated on two main questions: whether 
our implementation of an evolutionary search is useful, 
and whether our crossover operator aids our EA. 

5.1.1 Comparing Algorithms 

We compared three algorithms: a blind search, a mutation- 
only EA, and an EA with crossover. First, we employed 
a blind search. This algorithm simply creates a random 
A-type, and checks whether it is a solution; if it is not 
then it is destroyed and the process is repeated. This is 
a very special case of our EA; however, each candidate 
solution is entirely independent of all previous candi- 
date solutions — in the blind search all hereditary infor- 
mation is lost from one generation to the next. Second, 
we employed a mutation-only EA. Asexual evolution 
is seen in biology and it is a straightforward special 
case of our EA — we simply ensure that no crossovers 
are performed. Comparing our EA to the mutation- 
only special case offers a test of the efficacy of our 
crossover operator. Third, we employed our EA in its 
entirety. We name these algorithms blind_search_one, 
mutations earch-one, and geneticsearch-one respectively. 

5.1.2 Benchmark Learning Tasks 

To assess the performance of our EA we chose three 
simple supervised learning tasks. These tasks involved 
searching for A-types that represent simple classes of 
functions: n-identity, n-multiplexer and n-carry. Their 
simplicity allowed us to investigate performance of our 
algorithm as the complexity of the problem is scaled up. 
Also, it is easy to write down exact solution A-types for 
each task investigated. In Sections 15.31 15.41 and 15.51 we 
describe each task and the performance of our EA as 
it searches for that task. In this section we give further 
details of our experimental method. 

5.1.3 What We Measured 

To gauge the performance of our algorithms we con- 
ducted several trials. For each trial we recorded the 
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number of attempts required for a solution to be discov- 
ered: that is, the total number of A-types constructed in 
the initial population, via mutation and via crossover. 
Note that mutation_search_one constructs one new A- 
type in each generation (by mutation) , whereas geneticsearch-one 
constructs two or more (by mutation and crossover). 
For this reason we count the number of attempts rather 
than the number of generations. 

Each learning task that we consider is a class of 
concept functions parametrised by a positive integer n 
(usually n is just the input dimension). For each value 
of n we employed three algorithms and with each algo- 
rithm we conducted many trials. To display our results 
we present a plot of n versus attempts required. A data 
point on these plots represents an average of all tri- 
als for a particular algorithm searching for a particular 
concept function. For all trials associated with one data 
point we employ Student's i-test (for instance see [TJl 
sec 24.6]) to determine a 90% confidence interval. This 
determines the error bars displayed around each data 
point. We assume that our results are normally dis- 
tributed, as is required for the i-test to be valid. 



5.1.4 Suitable Training Data 

Although we define A-types to accept and return se- 
quential data, two of the three concepts that we searched 
for are tasks that require A-types to be operated in 
the clamped input mode. When we consider n-identity 
and n-multiplexer concepts we do so with clamped ex- 
amples. This makes our investigations computationally 
easier. Conducting numerous trials with several n val- 
ues is very computationally expensive if we search for A- 
types that operate in the more general sequential mode. 
In order to test our EA with A-types that operate in 
the sequential mode, we also devised a sequential task, 
namely n-carrjQ. 

Performing searches with long training examples takes 
a long time; performing searches with short examples 
usually leads to inexact solutions. Mindful of this we 
adopted the following procedure. We chose relatively 
short training examples to discover possible solutions, 
then tested these possible solutions with longer training 
examples. If a possible solution fits these longer train- 
ing examples then we deem it to be an exact solution 
(see below for more details). 

When we searched for clampable A-types that repre- 
sented a function /, we used a training set containing all 

7 Note that we use A-types with delay nodes for all three 
learning tasks. However, it can be shown that there exist A- 
types without delay nodes that represent clamped n-identity 
and clamped n-multiplexer; see Figure 15.31 for n = 1 and 
n = 2. 
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Fig. 11 The training set that we used for our searches for 
2-identity. Note that this training set is exhaustive in that 
this set contains all possible examples of 2-identity that have 
output sequences with length I = 3. 



possible examples (x, f(x)) such that f{x) is a sequence 
of three vectortd. That is, when the fitness of a candi- 
date solution A was assessed with an example (x, f(x)), 
the sequence containing the first three Boolean vectors 
output by A was compared with the sequence f(x). For 
example, Figure [TT] shows the training set that we used 
when we searched for 2-identity. 

When we search for clampable A-types that repre- 
sent a Boolean function / we define an exact solution 
as follows. Let xq denote a Boolean vector in the do- 
main of /. An A- type A is an exact solution to / if 
when xq is input into A, the constant sequence / (a?o) 
is returned by A for t moments, where t is some large 
but fixed positive integer. That is, the output nodes 
of A have constant value f(xo) for t moments starting 
from moment S. When searching for A-types that rep- 
resent clamped n-identity and clamped n- multiplexer, 
we deemed solutions to be exact when t = 1000. 

When searching for A-types that represented a se- 
quential function, our training data contained a single 
example (x,f(x)), where x was a random sequence of 
Boolean vectors. We chose x to be short so that solu- 
tions would often be found relatively quickly. As with 
the clamped case, to cater for the chance that a dis- 
covered solution is incorrect we defined exact solutions 
for sequential searches. We deemed a solution to be ex- 
act if it represents a training example (x, f(x)) where x 
consists of a random sequence of 10 4 Boolean vectors. 

5.1.5 Other Search Parameters 

For each of the three algorithms tested there are sev- 
eral parameters that require arguments; for instance, 
the population size, and the mutation to crossover ra- 
tio. To optimize each algorithm we need to search for 
appropriate arguments; furthermore, these arguments 

8 With the exception of n-identity when n £ {7,8,9, 10}; 
see Section 15.31 
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Table 5 Parameters common to all to three investigations ' 
in this section. Note that the we set penalty bound u equal 
to the upper bound for the size of A-types in the initial pop- 
ulation. This is specific to each learning task. 



when using any algorithm 



parameter 



argument 



population size 

worst fitness of a solution 

probability that a node is constructed as a delay 
node 

penalty bound 
pressure gradient 



100 
0.00 

20% 



when using genetic_search_one 



parameter 



argument 



crossovers per generation 1 
mutations per generation 1 
upper bound of size (% of internal nodes of 80% 
parent) of donor or acceptor subgraphs for 
crossover 

tThe number of crossovers and number of mutations per 
generation are allowed to vary in Section 15.61 



may be specific to each benchmark concept. We per- 
formed some rather informal investigations to decide 
upon arguments for these parameters. Those common 
to all three tasks are presented in Table [5] Further de- 
tails are presented as we introduce the investigations 
for each concept. 

5.1.6 Task Management 

We conducted our investigations using many cores of 
numerous computers. Consequently, we had to mini- 
mize any bias that this introduced into our results. For 
each learning task we considered a set of concept func- 
tions, each of which had a particular value of n. When 
we searched for a concept function / we used a set of a 
suitable number of training examples {x, f(x)}. In the 
cases where we did not use an exhaustive set of training 
examples, we randomly selected a suitably sized train- 
ing set from all possible examples. However, we ensured 
that the training examples remained constant as the 
training algorithms varied. That is, when we searched 
for an A- type representing a concept function /, the ith 
trial using each algorithm had the same set of train- 
ing examples. The processing time may vary from com- 
puter to computer, but the number of attempts required 
should remain constant. 

For both n-identity and n-multiplexer, for each in- 
teger n tested we performed at least twenty trials for 
each algorithm. The exception to this is for some blind 
searches, because on occasions the blind search took an 
excessively long time to complete. We note below when 
twenty trials were not performed for the blind search. 



5.2 Actual Solutions 

In this section we give examples of solutions obtained 
by our algorithms. In Figure [T^] we present some of the 
solutions found when we employed mutationsearch_one 
to search for clamped identity function with one input 
and one output (clamped 1-identity function in the lan- 
guage of Section [575)1 . The details of the search are given 
in Section 15.31 We found simple solutions without de- 



lay nodes; see Figure 12(a) We found solutions with 
subgraphs that did not contribute to the output of the 



solutions; see Figures 12(b) and 12(c) Note that such 
subgraphs may be considered 'junk'; however, A-types 
with such subgraphs may prove to be useful interme- 
diary forms in an algorithm based on a population of 
A-types. In Section 13.3.21 we explained that there al- 
ways exists an A-type without delay nodes that repre- 
sents a given clamped function. However, for all sim- 
ulations in this section we used A-types with delay 
nodes. Consequently, we found solutions that involve 



delay nodes; see Figure 12(d) Because A-types that 



represent clamped functions do not require the syn- 
chronisation of data, we found solutions having paths 
of differing lengths from the input node to the output 
node; see Figures and [l2(rjj 



5.3 Searching for Clamped n-Identity 

The first class of concept functions that we consider 
is clamped n-identity. Given a positive integer n, n- 
identity is the Boolean function from S n to S n that 
maps each n-component Boolean vector to itself. In Fig- 
ure 1131 we illustrate two examples (found by inspec- 
tion) of A-types that represent clamped n-identity — 
note that these also represent the more general function 
columnwise n-identity. 

In this section we describe our searches for A-types 
that represent n-identity for values of n that range from 
1 to 10. In Table Owe list the arguments that we chose 
for this search. When we searched for n-identity we em- 
ployed all examples with output sequences of length 3 
unless there were more than 100 of these (this was the 
case when n e {7, 8, 9, 10}). In the latter case we ran- 
domly chose 100 examples for each trial. Our choice of 
training data almost always gave exact solutions and, 
as described above, we ensured that this choice was not 
a variable when we compared our algorithms. 

In Figure RHl we compare the performances of blinds earch-one 
and mutation_search_one when searching for A-types 
that represent n-identity with n ranging from 1 to 10. 
Note that we do not display results for blind_search-<me 
for n > 4. This is because all trials using blind_search_one 
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(a) An A-type, with 5 = 2, that represents 1-identity. 



(a) The simplest A-type (5 
nodes. 



2) without delay 



o=to= 



0=0 




(b) This A-type (5 = 2) has a redundant node, 
namely the internal node without outgoing arrows. 
We may consider this node as 'junk' because it does 
not contribute to the output of the A-type. 



q v OX) p-- 



(c) This A-type (5 = 2) also has a redundant node, 
namely the disconnected internal node. 



(d) Although unnecessary, we include delay nodes 
in our searches for clamped n-identity. Consequently 
we find solutions that involve delay nodes, as shown 
here (5 = 2). 



(b) An A-type, with 5 = 2, that represents 2-identity. 

Fig. 13 Two A-types that represent the Boolean function 
n-identity for n 6 {1,2}. Note that these also represent the 
more general function columnwise n-identity. 



Table 6 Parameters used for our clamped n-identity 
searches. 



parameter 



argument 



lower bound for size of initial machines 
upper bound for size of initial machines 
max num of attempts 
trials per training example 
length for exact solution 



3n 
in 

10 9 
30 

10 3 



1.4e+07 
1.2e+07 
le+07 
8e+06 
6e+06 
4e+06 
2e+06 




blind_search_one 
mutation_search_one 



(e) The domain of a clamped function contains only 
constant sequences. Consequently information may 
flow unsynchronised through an A-type solution; 
this is the case for the A-type shown here (5 = 2). 



o- 





(f) An A-type (5 
directed path. 



o 



4) whose graph has a closed 



Fig. 12 Some A-type solutions discovered when ge- 
neticsearch_one was employed to search for clamped iden- 
tity. 



Fig. 14 Searching for A-types that represent n-identity with 
blind_search_one and mutation_search_one. Here we show the 
average number of attempts required before a solution was 
discovered. 



failed to find a solution within 10 9 attempts. These re- 
sults show that mutation_search-one outperforms blind_search-one 
by orders of magnitude. 

In Figure [15] we compare the performances of mu- 
tation_search-one and geneticsearch_one when search- 
ing for A-types that represent n-identity with n ranging 
from 1 to 10. These results show that geneticsearch_one 
significantly outperforms mutation_search-One. 

In conclusion, the results in this section provide ev- 
idence that when our EA searches for n-identity it sig- 
nificantly outperforms a blind search. They also pro- 
vide evidence that when our EA searches for clamped 
n-identity, our crossover operator aids our EA. 
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Fig. 15 Searching for A-types that represent n-identity with 
geneticsearch_one and mutationsearch_one. Here we show 
the average number of attempts required before a solution 
was discovered. 

Table 7 Parameters used for our clamped n-multiplexer 
searches. Note that for the lower bound we devised the follow- 
ing function I of n. {1(2) = 7, 1(3) = 13, 1(4) = 18, 1(5) = 24} 
by examining solutions constructed by concatenating copies 
of our 2-multiplexer. 



parameter 



argument 



lower bound for size of initial machines l(n) 

upper bound for size of initial machines l(n) + 4 

max num of attempts 10 s 

trials per training example 20 

length for exact solution 10 3 



5.4 Searching for Clamped n-Multiplexer 

The second class of concept functions that we consider 
is clamped n-multiplexer. An A-type A, with delay 8, 
that represents n-multiplexer has n regular input nodes 
(xi, . . . , x n ) and log2(n) (rounded up to the next inte- 
ger) extra input nodes called selector pins Sj . Consider 
the input on the selector pins of A at moment t. This 
gives a binary representation of some integer i. At mo- 
ment (t + S) the output of A is equal to the value of Xi 
at moment t. 

Several researchers have applied EAs to the task of 
discovering multiplexers. This started with Wilson [3"2"] 
and others have also investigated this task, for exam- 
ple Koza [J3J ch 7], Butz [5J ch 3]. In particular, Bull 
and Preene [I] used simulated evolution to design clam- 
pable A-types that represent clamped n-multiplexers 
Although n-multiplexer is more complex than n-identity, 
it is another class of problem that scales easily. 

In this section we describe our searches for A-types 
that represent n-multiplexer for values of n that range 
from 2 to 5. In Table [7] we list the arguments that we 
chose for this search. In Figure [TBI we illustrate A-types 
(found by inspection) that represent n-multiplexer where 
n G {2,3} — note that these also represent columnwise 
n-multiplexer. 




(a) An A-type, 
multiplexer. 



with a delay 5 



3, that represents 2- 




» o 



(b) An A-type, with a delay 5 = 6, that represents 3- 
multiplexer. 

Fig. 16 A-types that represent n-multiplexer for n £ {2,3}. 



In Figure[T7]we compare the performances of blinds earch_one 
and mutations earch-one. When we used blinds earch-one 
to search for A-types that represent 3-multiplexer, only 
two of the twenty trials returned a solution (before 10 s 
attempts) . We include the data point corresponding to 
n = 3 for blindsearch-one as a lower bound; that is, we 
expect that had we allowed a greater maximum number 
of generations, the point corresponding to 3-multiplexer 
for blinds earch_one would be greater than that shown. 
These results show that mutationsearch_one signifi- 
cantly out-performs blindsearch_one. 

In Figure [TBI we compare the performances of muta- 
tions earch_one and geneticsearch_one. These results 
show that when searching for n-multiplexers for n £ 
{2, 3, 4, 5} there is no conclusive difference between the 
performance of our two EAs. Considering the relative 
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Fig. 17 Searching for A-types that represent n-multiplexer 
with blind_search_one and mutations ear ch^one. Here we 
show the average number of attempts required before a solu- 
tion was discovered. 
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Fig. 18 Searching for A-types that represent n-multiplexer 
with geneticsearch_one and mutation_search_one. Here we 
show the average number of attempts required before a solu- 
tion was discovered. 



positions of the means of the trials for 5-multiplexer, 
we speculate that as n increases the crossover operator 
may prove to be beneficial when our EA searches for 
n-multiplexers. 

In conclusion, the results in this section provide ev- 
idence that when our EA searches for n-multiplexer 
it significantly outperforms a blind search. However, 
they fail to provide strong evidence that when our EA 
searches for n-multiplexer our crossover operator aids 
our EA. 



5.5 Searching for Sequential n-Carry 

In the previous two sections we searched for clampable 
A-types. The third class of concept functions that we 
consider is of a different kind to those previously con- 
sidered: it does not consist of columnwise Boolean func- 
tions. We devised this class of functions to investigate a 
sequential task that has no clamped analogue. We call 
this class of functions n-carry. Informally, n-carry maps 
a single bit string to a set of n bit strings; each of these 
output strings is a segment of the input string. More 
formally, for some positive integer n and some integer 



Fig. 19 Four input-output pairs of 2-carry. 



Table 8 Parameters used for our sequential n-carry searches. 



parameter 



argument 



lower bound for size of initial machines 3 + 2(n — 1) 

upper bound for size of initial machines 3 + 2n 

max num of attempts 10 9 

trials per training example 20' 

length of exact solution 10 4 
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^10 for the blind search. 



I > n, n-carry is a function /„ frorro Sij to S n u_ n +x\ 
such that each sequence 



x = ([a ; ], [o/_i], . . . , [fli]) 
is mapped to 



fc(x) 



For example in Figure [TO] we show four input-output 
pairs for 2-carry. In Figure [201 we illustrate two exam- 
ples (found by inspection) of A-types that represent 
n-carry. 

We searched for A-types that represent n-carry for 
values of n that range from 1 to 8. For each n-carry 
search we chose a training example with a random input 
sequence of length 50. For each value of n we conducted 
20 trials per algorithm and the training example for the 
ith trial was the same for all algorithms. In Table [5] we 
list the arguments that we chose for this search. 

In Figure[2T]we compare the performances of blind_search_one 
and mutations earch_one as they search for n-carry, for 
n ranging from 1 to 8. Note that when using blinds earch-one 
all trials for n > 4 failed to find a solution. From 

9 Recall from Section 13.41 that we denned S m j to denote 
the set of all sequences of length I consisting of m-component 
Boolean vectors. 
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(a) An A-type, with a delay 5 = 3, that rep- 
resents 2-carry. 




x " O 



(b) An A-type, with a delay 5 = 4, that represents 3-carry. 
Fig. 20 A-types that represent n-carry for n £ {2, 3} 
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Fig. 21 Searching for A-types that represent n-carry with 
blinds ear -ch-one and mutations ear -ch-one. Here we show the 
average number of attempts required before a solution was 
discovered. 



these two figures we see that mutations earch_one sig- 
nificantly outperforms blindsearch_one. 

In Figure [221 we compare the performances of ge- 
netics earch-one and mutationsearch-one as they search 
for n-carry. These results show that geneticsearch_one 
significantly outperforms mutationsearch-one. 

In conclusion, the results in this section provide ev- 
idence that when our EA searches for n-carry it signif- 
icantly outperforms a blind search. They also provide 
evidence that when our EA searches for n-carry our 
crossover operator aids our EA. 



5.6 Parameter Bias 

The above results suggest that our crossover opera- 
tor is useful; however, we must be mindful that ge- 
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Fig. 22 Searching for A-types that represent n-carry with 
genetic_search_one and mutation_search_one. Here we show 
the average number of attempts required before a solution 
was discovered. 
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1:0 100:1 10:1 1:1 1:10 1:100 0:1 
crossovers per generation : mutations per generation 

Fig. 23 Searching for A-types that represent 7-carry using 
genetic_search_one and various values of the ratio (crossovers 
per generation): (mutations per generation). Here we show the 
average number of attempts required before a solution was 
discovered. 



neticsearch_one has many parameters that require ar- 
guments for a particular search. Because our investiga- 
tions were a 'proof of concept' we simply chose param- 
eter values that ensured that we found solutions. These 
values were held constant as we varied the algorithms. 

We did investigate the effect of varying the (crossovers 
per generation): (mutations per generation) ratio in ge- 
neticsearch_one when searching for 7-carry. The other 
parameter values for these simulations were those spec- 
ified in Tables [5] and [5] The results are presented in Fig- 
ure[221 Having a (crossovers per generation): (mutations 
per generation) ratio of 1 : 1 gave optimal performance. 
Note that in the special case when the ratio is : 
1, geneticsearch-one is effectively the same as muta- 
tions earch-one. 



5.7 Is our Crossover Simply Macromutation? 

The above results provide evidence that our A-type 
crossover operator is useful. However, we have yet to 
investigate whether this is simply because our crossover 
operator is a 'macromutator'; that is, whether our crossover 
operator is only useful because it mixes the population 
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Fig. 24 Searching for A-types that represent n-identity with 
genetic_search_one and our headless chicken crossover search. 
Here we show the average number of attempts required before 
a solution was discovered. 



Fig. 25 Searching for A-types that represent n-carry with 
genetic_search_one and our headless chicken crossover search. 
Here we show the average number of attempts required before 
a solution was discovered. 



5.8 Size Bias 



more effectively than our mutation operators. We turn 
to this question now. 

The results from the n-identity searches and the 
n-carry searches demonstrate that for some tasks the 
crossover of our EA is useful. In many EAs crossover is 
useful because it provides sudden large variation in the 
population, rather than because it recombines individu- 
als ch 6] . Such an operator is called a macromutation 
operator. This is not the case in biology: the utility of 
biological crossover is due to its ability to rccombine 
individuals' information Hi p276]. 

The 'headless chicken' search offers a relatively sim- 
ple means of testing whether a crossover operator is 
simply acting as a macromutator [TT], [33]. The head- 
less chicken search is an EA where only one parent is 
selected from the population and the other parent is an 
entirely new individual [2, pl53]. We implemented the 
headless chicken algorithm by duplicating genetics earch_ 
with the following modification. For each crossover, af- 
ter we have selected the parents $, O* we randomly 
choose one parent P and then construct a random A- 
type P' that is the same size as P. We then perform 
the crossover using P' and the other parent. 



We now briefly turn to the size of solutions obtained 
by different algorithms. Consider n-carry, for example. 
The graph in Figure [55] shows that there is not a great 
difference between the solution sizes found by muta- 
tions earch_one and genetics earch-one. Hence the dif- 
ference in performance of these algorithms is not due 
to size differences in the populations. 

More generally, one can consider the diversity of the 
population as the algorithm progresses. It can be seen 
from Figure [26] that the algorithms found solutions of 
different sizes for each fixed value of n; in particular, 
these solutions were not all the same. This indicates 
the presence of at least some diversity. We did not in- 
vestigate population diversity systematically. See also 
Figure [T^J which shows a sample of solutions obtained 
by using mutations ear ch_one to search for A-types that 
represent 1-identity. 

Recall from Section WM that our method for fitness- 
based selection employs an exponential function. This 
|trongly favours fitter individuals, which may reduce 
the diversity of the population. Our choice of exponen- 
tial sufficed for our algorithm comparisons. One advan- 
tage of our method for fitness-based selection is that it 
would be easy to vary: one can replace the exponential 
with any other monotone function. 



We compare geneticsearch_one and our headless 
chicken search for clamped n-identity and n-carry, the 6 Conclusion 
benchmark tasks that demonstrated the utility of our 

crossover. Figure[M]shows that when searching for clamped We devised a graph-based EA for finding A-types that 
n-identity, geneticsearch_one outperforms our headless represent a given function. When applied to the three 
chicken search. Figure [25] shows that when searching for benchmark problems, the EA performed considerably 
n-carry, geneticsearch_one also outperforms our head- better than a purely random search. For clamped n- 
less chicken search. identity and n-carry, the full version of the EA per- 

formed better than the mutation-only version. Our al- 
This provides evidence that for some tasks our crossover gorithm worked in both the clamped and the sequential 
operator is more useful than a macromutation operator. settings. 
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Fig. 26 The sizes of A-types found to represent n-carry using 
mutations earch_one and genetics ear ch^one. Again, the error 
bars show the 90% confidence interval using Student's t-test. 



We now suggest directions for future research. A- 
types are relatively simple, yet they are recurrent Boolean 
ANNs capable of representing any Boolean function 
and operating in a sequential mode. Consequently, we 
believe A-types are a useful tool for investigating the 
learning and behaviour of Boolean ANNs. In particu- 
lar, the simplicity of A-types means that manipulations 
of their graphs are often straightforward to implement. 
We suggest two areas of future research with A-types: 
further investigations into evolutionary techniques, and 
using the symmetry of a concept function to improve 
the search for an A-type that represents that function. 



6.1 Evolving Evolutionary Operators 

Here we propose that it is worthwhile to continue to 
search for useful evolutionary operators for A-types. 
Furthermore, we propose that evolutionary searches can 
be applied to discover these operators. The evolution 
of parameters of a search is an established technique 
in evolutionary computing [5] ch 4]. Many researchers 
have extended this idea to include the evolution of evo- 
lutionary operators |26) . In terms of evolving networks, 
Teller's research [28] [27] is of particular interest. Teller 
solved signal classification tasks by evolving two pop- 
ulations simultaneously. One population was a set of 
programs, which were represented with graphs. The 
other population was a set of evolutionary operators 
that operated on the programs. We believe that it would 
be worthwhile to co-evolve evolutionary operators in a 
manner analogous to Teller's research. This would allow 
a more complete investigation of what happens when 
one varies the many parameters in our EA. 

The results in Section \b\ show that, for some prob- 
lems, our crossover operator is more useful than a macro- 
mutation operator. Although our crossover operator em- 
ploys relatively simple graph-theoretical ideas, its im- 
plementation is rather involved. By evolving evolution- 




Fig. 27 Redrawing the A-type, 5 = 3, shown in Figure [7] to 
emphasise the mirror symmetry of the solution. 



ary operators for A-types, one may be able to find more 
complicated but better-performing A-type crossover op- 
erators and test whether certain properties (such as the 
out-degree of nodes, connectedness of subgraphs, net- 
work activitj0, and perhaps some measure of symme- 
try) are useful. 



6.2 Making Use of Symmetry 

The notion of symmetry, which is made precise by group 
theory, leads to useful problem-solving techniques. Con- 
sider the A-type shown in Figure [7] which represents 
columnwise Exclusive- OR. This function is symmetric 
in its arguments: that is, A®B = B ®A for all A and B. 
In Figure [57] we redraw this A-type to show that it has 
'mirror symmetry' about a horizontal line. So column- 
wise Exclusive-OR has a symmetry; when searching for 
an A-type that represents it, both the concept function 
and one of its solutions share this property. We hypoth- 
esise that this idea can be formalised using group theory 
for a class of concept functions admitting a symmetry 
and used to cut down the size of the search space of an 
EA. This is work in progress. 

Recently Kondor [T3] investigated the use of group- 
theoretic methods to improve some modern machine 
learning techniques. Other researchers have also applied 
symmetries to ANNs for this purpose pQ [25] [33]. Re- 
cently Dong and Zhang [8] incorporated group-theoretic 
techniques into EAs with populations of ANNs, using 
relatively simple operators. The simplicity of A-types 

10 Loosely, we can define the activity of a node as the av- 
erage number of changes of state per moment it undergoes 
when a large random data packet is processed by the net- 
work. Furthermore, we can define the activity of a subgraph 
of an A-type as an average of the activity of all nodes in that 
network. Note that Teuscher |291 ch 5] defines activity of A- 
types and uses this to investigate the non-linear dynamics of 
these networks. 
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makes them a good setting in which to further imple- 
ment and test the application of group-theoretic ideas 
on a population of evolving ANNs. 
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